Pub-Sub API Design Evaluation and Latency Budget

Introduction#

In the previous lesson, we created the API for the pub-sub service to meet the functional requirements identified earlier. This lesson focuses on different techniques employed to meet the non-functional requirements and the tradeoffs between them. We also estimate the response time of the pub-sub API to analyze if the latency of the API is in the ideal range. Moreover, after analyzing all the aspects of API design, we will discuss the notification service as a use case of the pub-sub service.

Non-functional requirements#

The following section discusses how the pub-sub service meets the non-functional requirements:

Scalability and availability #

The replication of brokers and clusters enhances the scalability and aids in the availability of the API. Partitioning helps to distribute messages or events to multiple queues based on topics or message type. This enables the system to scale horizontally and distribute the workload across multiple brokers without blocking or slowing down the service. Moreover, asynchronous processing of events from different partitions allows the pub-sub service to efficiently scale and handle peak loads without compromising performance. Load balancing and horizontal scaling also enable the pub-sub service to scale to handle numerous events and users and remain available.

We can also limit the requests for publishers and subscribers to avoid overwhelming the system with numerous requests. Similarly, we monitor the API to receive timely alerts to mitigate any issues in the API.

Security #

The users can request a list of available topics to subscribe to without logging in. In this case, we prefer that the clients have an API key as a security check. The user must be logged in to send a subscription request by providing an API key and an access token. To authenticate subscription requests from the intended user, we introduced verification of intent (VOI) as an extra security layer. VOI is used to avoid subscription requests from unauthorized users on behalf of a subscriber. This is done by sending a verification request to the callBack URL of a subscriber. It’s a GET request embedded with a challenge parameter that a subscriber fulfills by echoing the challenge in response. More details on VOI are available here

Verification of intent from the subscriber
Verification of intent from the subscriber

Low latency#

The latency of the pub-sub service is reduced significantly due to asynchronous communication, which takes minimum processing time when services get decoupled from each other. Moreover, the parallel processing communication from replicated brokers lowers the latency of performing different operations. Caching a list of topics and events that need to be pushed to multiple subscribers can significantly reduce the latency.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches


Scalability and availability

  • Cluster and broker replication
  • Partitioning for distribution of messages
  • Load balancing and horizontal scaling
  • Rate limiting
  • API monitoring


Security

  • API key
  • JWT for authorization
  • VOI


Low latency

  • Asynchronous communication of services
  • Parallel communication of events
  • Caching

Latency budget#

This section aims to estimate the response time of the pub-sub API. We can estimate the response time by considering two requests: listing topics (the GET request) and creating a topic (the POST request). Because the response time varies according to data sizes, we first estimate the sizes of both the request and response. In the next step, we calculate the estimated response time of the pub-sub service.

Note: As discussed in the Back-of-the-Envelope Calculations for Latency chapter, the latency of the GET and POST requests are affected by two different parameters. In the case of GET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 ms per KB. Similarly, for POST requests, the RTT time changes with the data size by 1.15 ms per KB after the base RTT time, which was 260 ms.

List topics#

First, let's start by estimating the request and response sizes for this GET request.

  • Request size: Our request almost always remains simple, so we'll consider approximately 1 KB as the request size, including header and query parameters with no request body.

  • Response size: The response to the request includes a list of topics. So, if we consider a single page containing a list of the 50 topics retrieved on each request, the total response size would be 6 KB (data = 5 KB, header = 1 KB). We consider only response size because it affects the response time. The request size is minimal and is covered in the RTT of the request.

Response time: Let’s now estimate the latency and response time. In EDA, the services work asynchronously without depending on each other's response. So, the processing time to estimate the response time is kept to the minimum, which is 4 ms, as discussed in the latency budget chapter earlier in the course.

The response time for getting a list of the topics is given below:

Response Time Calculator to List Topics

Enter size in KBs6KB
Minimum latencyf192.9ms
Maximum latencyf273.9ms
Minimum response timef196.9ms
Maximum response timef277.9ms

Assuming the response size is 6 KB, the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4×size of response (KB)=120.5+70+0.4×6=192.9 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KB) = 120.5 + 70 + 0.4 \times 6 = 192.9\ ms

Timelatency_max=Timebase_max+RTTget+0.4×size of response (KB)=201.5+70+0.4×6=273.9 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KB) = 201.5 + 70 + 0.4 \times 6 = 273.9\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for the minimum response time, we use the minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=192.9 ms+4 ms=196.9 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 192.9\ ms + 4\ ms = 196.9\ ms

Now, for the maximum response time, we use the maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=273.9 ms+4 ms=277.9 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 273.9\ ms + 4\ ms = 277.9\ ms

Create topic#

  • Request size: To create a topic, the request contains a body containing data related to creating a topic and other essential headers. So, the estimated size is approximately 2 KB, including parameters to create a topic.

  • Response size: The response generally contains the status code and is approximately 1 KB, including the header.

While the response time is affected by the request size only in a POST request, we consider request size for estimation. Because the response size is negligible, it gets covered in the RTT. In the case of an operation to create a topic, the response time will be as follows:

Response Time Calculator to Create Topic

Enter size in KBs2KB
Minimum latencyf383.2ms
Maximum latencyf464.2ms
Minimum response timef387.2ms
Maximum response timef468.2ms

Assuming the request size is 2 KB:

Timelatency=Timebase+RTTpost+DownloadTime_{latency} = Time_{base} + RTT_{post} + Download

RTTpost=RTTbase+1.15×Size=260 ms+1.15 ms×2 KBRTT_{post} = RTT_{base}+1.15\times Size = 260\ ms + 1.15\ ms\times 2\ KB

Timelatency_min=Timebase_min+(RTTbase+1.15×size of request (KB))+0.4Time_{latency\_min} = Time_{base\_min} + (RTT_{base} + 1.15 \times size\ of\ request\ (KB)) + 0.4

=120.5+(260+1.15×2)+0.4=383.2 ms= 120.5 + (260 + 1.15 \times 2) + 0.4 = 383.2\ ms

Timelatency_max=Timebase_max+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_max} = Time_{base\_max} + (RTT_{base} + 1.15\times size\ of\ request\ (KBs)) + 0.4

=201.5+260+1.15×2+0.4=464.2 ms= 201.5 + 260 + 1.15 \times 2 + 0.4 = 464.2\ ms

Similarly, the response time is calculated as follows:

TimeResponse_min=Timelatency_min+Timeprocessing_min=383.2 ms+4 ms=387.2 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 383.2\ ms + 4\ ms = 387.2\ ms

TimeResponse_max=Timelatency_max+Timeprocessing_max=464.2 ms+4 ms=468.2 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 464.2\ ms + 4\ ms = 468.2\ ms

Response time for the list topics and create a topic requests
Response time for the list topics and create a topic requests

Notifications as a pub-sub service#

Now that we have analyzed all the aspects of designing a pub-sub API, we will discuss its use case as a notification service. As mentioned in the requirements lesson, it will be used as a building block in many design problems. In real-world applications, almost every application has a mechanism to send notifications of events to their clients for a better user experience.

Notification delivery is one of the key features of the pub-sub service. Whenever an event triggers, the pub-sub service notifies the subscribers. We will analyze how the pub-sub service delivers the notification to the subscribers and manages all the data related to the notification delivery. These notifications can include data and simple messages to inform the user about a specific event. The pub-sub or similar services provide notification functionality and other valuable services.

The following illustration is a recall of the creation and delivery of the notification of an event to the subscriber:

Event creation and notifying the subscribers
Event creation and notifying the subscribers

An important question that comes to mind is whether a pub-sub service provides guaranteed delivery of the notification. The answer depends on the way the notification delivery is implemented, which can be one of the following semantics:

  • At-most-once: This semantic refers to pushing a notification to the subscriber and hoping it gets delivered. The message is lost if a subscriber doesn't receive it somehow. The pub-sub sends a message and updates the message's status.

  • At-least-once: This refers to pushing a notification periodically until an acknowledgment is received from the subscriber. In this scenario, duplicate delivery can occur. The pub-sub service sends a message and only updates the status when the acknowledgment is received; otherwise, it retries.

  • Exactly-once: This delivery semantic refers to pushing the notification for delivery exactly once. Here, to keep atomicity, no duplicate message is delivered. After delivering the message, the status at the backend is updated. The exactly-once delivery semantic is difficult to achieve in a network like the Internet.

All three scenarios are explained in the following illustration, where pub-sub sends events A, B, and C to the subscribers:

Notification delivery options in the pub-sub service
Notification delivery options in the pub-sub service

Workflow of the notification service#

A relevant event is generated when a publisher creates a topic using the pub-sub API. The event is sent to the pub-sub service that filters out the event and enqueues it to the relevant partition queue of the topic, as shown in the first lesson of the chapter. The pub-sub service pushes the event to the subscribers based on a specific delivery semantic. The pub-sub service updates the delivery status to the notification database depending on the event delivery semantics.

The subsequent section discusses the format of the message.

Message format#

All the notifications related to events are sent through an HTTP POST request to the pre-configured endpoint, which is defined during the subscription using callBack URL of the subscriber. The notifications delivered by the pub-sub service can be acknowledged by returning an HTTP success status, such as 200 OK. If a message gets lost, the received HTTP status includes an error code indicating that the pub-sub needs to resend the notification to the subscriber.

A message sent to the subscriber for a triggered event must contain at least one attribute. The message follows the JSON data format. It’s sent as HTTP POST request in the following format:

The HTTP request format to push notification

The notification service is one of the many use cases of the pub-sub service we discussed, and we’ll use it as a building block in the upcoming design problems. The other primary use case is that it enables asynchronous communication in a microservice architecture by decoupling services.

Summary#

In this chapter, we started by learning why we should opt for event-driven over request-response architecture. The workflow of the pub-sub service defined how the publisher-subscriber model works. In the API model lesson, we learned how to define different endpoints and message formats to communicate with those endpoints. Later in this chapter, we also learned how the pub-sub service could be used as a notification service for major design problems.

API Model for Pub-Sub Service

Advanced API Design Problems